Using a maximum entropy-based tagger to improve a very fast vine parser
نویسندگان
چکیده
In this short paper, an off-the-shelf maximum entropy-based POS-tagger is used as a partial parser to improve the accuracy of an extremely fast linear time dependency parser that provides state-of-the-art results in multilingual unlabeled POS sequence parsing.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملCross-lingual Adaptation as a Baseline: Adapting Maximum Entropy Models to Bulgarian
We describe our efforts in adapting five basic natural language processing components to Bulgarian: sentence splitter, tokenizer, part-of-speech tagger, chunker, and syntactic parser. The components were originally developed for English within OpenNLP, an open source maximum entropy based machine learning toolkit, and were retrained based on manually annotated training data from the BulTreeBank...
متن کاملMulti-lingual Dependency Parsing at NAIST
In this paper, we present a framework for multi-lingual dependency parsing. Our bottom-up deterministic parser adopts Nivre’s algorithm (Nivre, 2004) with a preprocessor. Support Vector Machines (SVMs) are utilized to determine the word dependency attachments. Then, a maximum entropy method (MaxEnt) is used for determining the label of the dependency relation. To improve the performance of the ...
متن کاملPart-of-Speech Tagging and Chunking with Maximum Entropy Model
This paper describes our work on Part-ofspeech tagging (POS) and chunking for Indian Languages, for the SPSAL shared task contest. We use a Maximum Entropy (ME) based statistical model. The tagger makes use of morphological and contextual information of words. Since only a small labeled training set is provided (approximately 21,000 words for all three languages), a ME based approach does not y...
متن کاملAutomatically Adapting an NLP Core Engine to the Biology Domain
Background: Rather than specifying rules, constraints and lexicons for NLP systems manually, we advocate a procedure for automatically acquiring linguistic knowledge using machine learning (ML) methods. In order to demonstrate how feasible this approach is, we automatically adapt OpenNLP, an open source ML-based NLP tool suite, to the sublanguage domain of biology. Results: In the first evaluat...
متن کامل